Methods and apparatus for decoding a compressed hoa signal

ABSTRACT

Methods and apparatus for decoding a compressed Higher Order Ambisonics (HOA) representation of a sound or soundfield. The method may include receiving a bit stream containing the compressed HOA representation and decoding, based on a determination that there are multiple layers, the compressed HOA representation from the bitstream to obtain a sequence of decoded HOA representations. A first subset of the sequence of decoded HOA representations is determined based only on corresponding ambient HOA components. A second subset of the sequence of decoded HOA representations is determined based on corresponding ambient HOA components and corresponding predominant sound components. For a frame k, the sequence of decoded HOA representations are represented at least in part by 
     
       
         
           
             
               
                 c 
                 ^ 
               
               n 
             
             
               
                 k 
                 − 
                 1 
               
             
             = 
             
               
                 
                   
                     
                       
                         
                           
                             c 
                             ^ 
                           
                           
                             AMB 
                             , 
                             n 
                           
                         
                         
                           
                             k 
                             − 
                             1 
                           
                         
                       
                     
                   
                   
                     
                       
                         
                           
                             c 
                             ^ 
                           
                           n 
                         
                         
                           
                             k 
                             − 
                             1 
                           
                         
                         = 
                         
                           
                             c 
                             ^ 
                           
                           
                             PS 
                             , 
                             n 
                           
                         
                         
                           
                             k 
                             − 
                             1 
                           
                         
                         + 
                         
                           
                             c 
                             ^ 
                           
                           
                             AMB 
                             , 
                             n 
                           
                         
                         
                           
                             k 
                             − 
                             1 
                           
                         
                         , 
                       
                     
                   
                 
                 
                   
                     
                       
                         for n in the first subset 
                       
                     
                   
                   
                     
                       
                           
                           
                           
                           
                           
                           
                         for n in the second subset 
                       
                     
                   
                 
               
             
           
         
       
     
      where  
     
       
         
           
             
               
                 c 
                 ^ 
               
               
                 AMB, 
                 n 
               
             
             
               
                 k 
                 − 
                 1 
               
             
           
         
       
     
      corresponds to the corresponding ambient HOA components and  
     
       
         
           
             
               
                 c 
                 ^ 
               
               
                 PS 
                 , 
                 n 
               
             
             
               
                 k 
                 − 
                 1 
               
             
           
         
       
     
      corresponds to the corresponding predominant sound components.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is division of U.S. Pat. Application No. 16/186,765,filed Nov. 12, 2018, which is division of U.S. Pat. Application No.15/127,545, filed Sep. 20, 2016, now U.S. Pat. 10,127,914, which is theU.S. National Stage of the International Application No.PCT/EP2015/055916, filed Mar. 20, 2015, which claims priority toEuropean Patent Application No. 14305412.0, filed Mar. 21, 2014, each ofwhich is incorporated by reference in its entirety.

FIELD OF THE INVENTION

This invention relates to a method for compressing a Higher OrderAmbisonics (HOA) signal, a method for decompressing a compressed HOAsignal, an apparatus for compressing a HOA signal, and an apparatus fordecompressing a compressed HOA signal.

BACKGROUND

Higher Order Ambisonics (HOA) offers a possibility to representthree-dimensional sound. Other known techniques are wave field synthesis(WFS) or channel based approaches like 22.2. In contrast to channelbased methods, however, the HOA representation offers the advantage ofbeing independent of a specific loudspeaker set-up. This flexibility,however, is at the expense of a decoding process which is required forthe playback of the HOA representation on a particular loudspeakerset-up. Compared to the WFS approach, where the number of requiredloudspeakers is usually very large, HOA may also be rendered to set-upsconsisting of only few loudspeakers. A further advantage of HOA is thatthe same representation can also be employed without any modificationfor binaural rendering to head-phones.

HOA is based on the representation of the so-called spatial density ofcomplex harmonic plane wave amplitudes by a truncated SphericalHarmonics (SH) expansion. Each expansion coefficient is a function ofangular frequency, which can be equivalently represented by a timedomain function. Hence, without loss of generality, the complete HOAsound field representation actually can be assumed to consist of O timedomain functions, where O denotes the number of expansion coefficients.These time domain functions will be equivalently referred to as HOAcoefficient sequences or as HOA channels in the following. Usually, aspherical coordinate system is used where the x axis points to thefrontal position, the y axis points to the left, and the z axis pointsto the top. A position in space x = (r, θ, ϕ)^(T) is represented by aradius r > 0 (i.e. the distance to the coordinate origin), aninclination angle θ ∈ [0, π] measured from the polar axis z and anazimuth angle ϕ ∈ [0,2π[ measured counter-clockwise in the x - y planefrom the x axis. Further, (·)^(T) denotes the transposition.

A more detailed description of the HOA coding is provided in thefollowing. The Fourier transform of the sound pressure with respect totime denoted by

F_(t)(⋅), i.e.,  P(ω, x) = F_(t)(p(t, x))e^(−iωt)dt

with ω denoting the angular frequency and i indicating the imaginaryunit, may be expanded into the series of Spherical Harmonics accordingto

P(ω = kc_(s), r, θ, ϕ) = ∑_(n = 0)^(N)∑_(m = −n)^(n)A_(n)^(m)(k)j_(n)(kr)S_(n)^(m)(θ, ϕ).

Here c_(s) denotes the speed of sound and k denotes the angularwavenumber, which is related to the angular frequency ω by

$k = \frac{\omega}{c_{s}}$

. Further, j_(n)(·) denote the spherical Bessel functions of the firstkind and

S_(n)^(m)(θ, ϕ)

denote the real valued Spherical Harmonics of order n and degree m. Theexpansion coefficients

A_(n)^(m)(k)

only depend on the angular wavenumber k. Note that it has beenimplicitly assumed that sound pressure is spatially band-limited. Thus,the series is truncated with respect to the order index n at an upperlimit N, which is called the order of the HOA representation. If thesound field is represented by a superposition of an infinite number ofharmonic plane waves of different angular frequencies ω and arrivingfrom all possible directions specified by the angle tuple (θ, ϕ), therespective plane wave complex amplitude function C(ω, θ, ϕ) can beexpressed by the following Spherical Harmonics expansion:

$C\mspace{6mu}\left( {\omega\mspace{6mu} = \mspace{6mu} kc_{s},\mspace{6mu}\theta,\mspace{6mu}\phi} \right)\mspace{6mu} = \mspace{6mu}\sum_{n\mspace{6mu} = \, 0}^{N}\mspace{6mu}\sum_{m\mspace{6mu} = \mspace{6mu} - n}^{n}\mspace{6mu} C_{n}^{m}\mspace{6mu}(k)\mspace{6mu} S_{n}^{m}\mspace{6mu}\left( {\theta,\mspace{6mu}\phi} \right),$

where the expansion coefficients

C_(n)^(m)(k)

are related to the expansion coefficients

A_(n)^(m)(k)

by

A_(n)^(m)(k) = i^(n)C_(n)^(m)(k).

Assuming the individual coefficients

C_(n)^(m)(w = kc_(s))

to be functions of the angular frequency ω, the application of theinverse Fourier transform (denoted by F⁻¹(·)) provides time domainfunctions

$c_{n}^{m}(t)\mspace{6mu} = \mspace{6mu} F_{t}^{- 1}\mspace{6mu}\left( {C_{n}^{m}\mspace{6mu}\left( {\omega/c_{s}} \right)} \right)\mspace{6mu} = \mspace{6mu}\frac{1}{2\pi}\mspace{6mu}{\int_{- \mspace{6mu}\infty}^{\infty}C_{n}^{m}}\mspace{6mu}\left( \frac{\omega}{c_{s}} \right)\mspace{6mu} e^{\text{i}\omega t}\text{d}\omega$

for each order n and degree m, which can be collected in a single vectorc(t) by

$\begin{array}{l}{c(t) = \left\lbrack {c_{0}^{0}(t)\,\,\, c_{1}^{- 1}(t)\,\,\, c_{1}^{0}(t)\,\,\, c_{1}^{1}(t)\,\,\, c_{2}^{- 2}(t)\,\,\, c_{2}^{- 1}(t)\,\,\, c_{2}^{0}(t)\,\,\,...} \right)\,\,\,} \\{...\,\, c_{N}^{N - 1}(t)\,\,\left( {c_{N}^{N}(t)} \right\rbrack^{T}.}\end{array}$

The position index of a time domain function

c_(n)^(m)(t)

within the vector c(t) is given by n(n + 1) + 1 + m. The overall numberof elements in the vector c(t) is given by O = (N + 1)². Thediscrete-time versions of the functions

c_(n)^(m)(t)

are referred to as Ambisonic coefficient sequences. A frame-based HOArepresentation is obtained by dividing all of these sequences intoframes C(k) of length B and frame index k as follows:

$C\mspace{6mu}(k)\mspace{6mu}: = \mspace{6mu}\begin{bmatrix}{c\left( {\left( {kB + 1} \right)T_{\text{S}}} \right)} & {c\left( {\left( {kB + 2} \right)T_{\text{S}}} \right)} & \ldots & {c\left( {\left( {kB + B} \right)T_{\text{S}}} \right)}\end{bmatrix},$

where T_(S) denotes the sampling period. The frame C(k) itself can thenbe represented as a composition of its individual rows c_(i)(k), i = 1,...,O, as

$C\mspace{6mu}(k)\mspace{6mu} = \mspace{6mu}\begin{bmatrix}{c_{1}\mspace{6mu}(k)} \\{c_{2}\mspace{6mu}(k)} \\ \vdots \\{C_{O}\mspace{6mu}(k)}\end{bmatrix}$

with c_(i)(k) denoting the frame of the Ambisonic coefficient sequencewith position index i.

The spatial resolution of the HOA representation improves with a growingmaximum order N of the expansion. Unfortunately, the number of expansioncoefficients O grows quadratically with the order N, in particular O =(N + 1)². For example, typical HOA representations using order N = 4require O = 25 HOA (expansion) coefficients. According to theseconsiderations, the total bit rate for the transmission of HOArepresentation, given a desired single-channel sampling rate f_(s) andthe number of bits N_(b) per sample, is determined by O · f_(s) · N_(b).Consequently, transmitting a HOA representation of order N = 4 with asampling rate of f_(s) = 48 kHz employing N_(b) = 16 bits per sampleresults in a bit rate of 19.2 MBits/s, which is very high for manypractical applications, as e.g. streaming. Thus, compression of HOArepresentations is highly desirable.

Previously, the compression of HOA sound field representations wasproposed in the European Patent applications EP2743922A, EP2665208A andEP2800401A. These approaches have in common that they perform a soundfield analysis and decompose the given HOA representation into adirectional and a residual ambient component.

The final compressed representation is assumed to comprise, on the onehand, a number of quantized signals, which result from the perceptualcoding of the directional signals, and relevant coefficient sequences ofthe ambient HOA component. On the other hand, it is assumed to compriseadditional side information related to the quantized signals, which isnecessary for the reconstruction of the HOA representation from itscompressed version.

Further, a similar method is described in ISO/IEC JTC1/SC29/WG11 N14264(Working draft 1-HOA text of MPEG-H 3D audio, January 2014, San Jose),where the directional component is extended to a so-called predominantsound component. As the directional component, the predominant soundcomponent is assumed to be partly represented by directional signals,i.e. monaural signals with a corresponding direction from which they areassumed to impinge on the listener, together with some predictionparameters to predict portions of the original HOA representation fromthe directional signals. Additionally, the predominant sound componentis supposed to be represented by so-called vector based signals, meaningmonaural signals with a corresponding vector which defines thedirectional distribution of the vector based signals. The knowncompressed HOA representation consists of I quantized monaural signalsand some additional side information, wherein a fixed number O _(MIN)out of these I quantized monaural signals represent a spatiallytransformed version of the first O _(MIN) coefficient sequences of theambient HOA component C_(AMB)(k - 2). The type of the remaining I - O_(MIN) signals can vary between successive frames, and be eitherdirectional, vector based, empty or representing an additionalcoefficient sequence of the ambient HOA component C_(AMB)(k - 2).

A known method for compressing a HOA signal representation with inputtime frames (C(k)) of HOA coefficient sequences includes spatial HOAencoding of the input time frames and subsequent perceptual encoding andsource encoding. The spatial HOA encoding 100, as shown in FIG. 1A,comprises performing Direction and Vector Estimation processing of theHOA signal in a Direction and Vector Estimation block 101, wherein datacomprising first tuple sets M_(DIR)(k) for directional signals andsecond tuple sets M_(VEC)(k) for vector based signals are obtained. Eachof the first tuple sets comprises an index of a directional signal and arespective quantized direction, and each of the second tuple setscomprising an index of a vector based signal and a vector defining thedirectional distribution of the signals. A next step is decomposing 103each input time frame of the HOA coefficient sequences into a frame of aplurality of predominant sound signals X_(PS) (k-1) and a frame of anambient HOA component C_(AMB) (k-1), wherein the predominant soundsignals X_(PS) (k-1) comprise said directional sound signals and saidvector based sound signals. The decomposing further provides predictionparameters ξ(k-1) and a target assignment vector v_(A,) _(T)(k - 1). Theprediction parameters ξ(k-1) describe how to predict portions of the HOAsignal representation from the directional signals within thepredominant sound signals X_(PS) (k-1) so as to enrich predominant soundHOA components, and the target assignment vector v_(A,) _(T)(k - 1)contains information about how to assign the predominant sound signalsto a given number I of channels.

The ambient HOA component C_(AMB)(k - 1) is modified 104 according tothe information provided by the target assignment vector v_(A,) _(T)(k -1), wherein it is determined which coefficient sequences of the ambientHOA component are to be transmitted in the given number I of channels,depending on how many channels are occupied by predominant soundsignals. A modified ambient HOA component C_(M,) _(A)(k - 2) and atemporally predicted modified ambient HOA component C_(P,) _(M,)_(A)(k - 1) are obtained. Also a final assignment vector v_(A)(k - 2) isobtained from information in the target assignment vector v_(A,)_(T)(k - 1). The predominant sound signals X_(PS)(k-1) obtained from thedecomposing, and the determined coefficient sequences of the modifiedambient HOA component C_(M,) _(A)(k - 2) and of the temporally predictedmodified ambient HOA component C_(P,) _(M,) _(A)(k - 1) are assigned tothe given number of channels, using the information provided by thefinal assignment vector v_(A)(k - 2), wherein transport signalsy_(i)(k - 2), i = 1, ...,I and predicted transport signals y_(P,)_(i)(k - 2), i = 1, ...,I are obtained. Then, gain control (ornormalization) is performed on the transport signals y_(i)(k - 2) andthe predicted transport signals y_(P,) _(i)(k - 2), wherein gainmodified transport signals z_(i)(k - 2), exponents e_(i)(k - 2) andexception flags (β_(i)(k - 2) are obtained.

As shown in FIG. 1B, the perceptual encoding and source encodingcomprises perceptual coding of the gain modified transport signalsz_(i)(k- 2), wherein perceptually encoded transport signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right),\,\, i = 1,\,...,\, I$

are obtained, encoding side information comprising said exponentse_(i)(k - 2) and exception flags β_(i)(k - 2), the first and secondtuple sets M_(DIR)(k), M_(VEC)(k), the prediction parameters ξ(k-1) andthe final assignment vector v_(A)(k - 2), and encoded side information

$\overset{\smile}{\text{Γ}}\left( {k - 2} \right)$

is obtained. Finally, the perceptually encoded transport signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right)$

and the encoded side information are multiplexed into a bitstream.

SUMMARY OF THE INVENTION

One drawback of the proposed HOA compression method is that it providesa monolithic (i.e. non-scalable) compressed HOA representation. Forcertain applications, like broadcasting or internet streaming, it ishowever desirable to be able to split the compressed representation intoa low quality base layer (BL) and a high quality enhancement layer (EL).The base layer is supposed to provide a low quality compressed versionof the HOA representation, which can be decoded independently of theenhancement layer. Such a BL should typically be highly robust againsttransmission errors, and be transmitted at a low data rate in order toguarantee a certain minimum quality of the decompressed HOArepresentation even under bad transmission conditions. The EL containsadditional information to improve the quality of the decompressed HOArepresentation.

The present invention provides a solution for modifying existing HOAcompression methods so as to be able to provide a compressedrepresentation that comprises a (low quality) base layer and a (highquality) enhancement layer. Further, the present invention provides asolution for modifying existing HOA decompression methods so as to beable to decode a compressed representation that comprises at least a lowquality base layer that is compressed according to the invention.

One improvement relates to obtaining a self-contained (low quality) baselayer. According to the invention, the O _(MIN) channels that aresupposed to contain a spatially transformed version of the (without lossof generality) first O _(MIN) coefficient sequences of the ambient HOAcomponent C_(AMB)(k - 2) are used as the base layer. An advantage ofselecting the first O _(MIN) channels for forming a base layer is theirtime-invariant type. However, conventionally the respective signals lackany predominant sound components, which are essential for the soundscene. This is also clear from the conventional computation of theambient HOA component C_(AMB)(k - 1), which is carried out bysubtraction of the predominant sound HOA representation C_(PS)(k - 1)from the original HOA representation C(k - 1) according to

C_(AMB)(k − 1) = C(k − 1) − C_(PS)(k − 1)

Therefore, one improvement of the invention relates to the addition ofsuch predominant sound components. According to the invention, asolution to this problem is the inclusion of predominant soundcomponents at a low spatial resolution into the base layer. For thispurpose, the ambient HOA component C_(AMB)(k - 1) that is output by aHOA Decomposition processing in the spatial HOA encoder according to theinvention is replaced by a modified version thereof. The modifiedambient HOA component comprises in the first O _(MIN) coefficientsequences, which are supposed to be always transmitted in a spatiallytransformed form, the coefficient sequences of the original HOAcomponent. This improvement of the HOA Decomposition processing can beseen as an initial operation for making the HOA compression work in alayered mode (for example dual layer mode). This mode provides e.g. twobit streams, or a single bit stream that can be split up into a baselayer and an enhancement layer. Using or not using this mode issignalized by a mode indication bit (e.g. a single bit) in access unitsof the total bit stream.

In one embodiment, the base layer bit stream

${\overset{\smile}{\text{B}}}_{BASE}\left( {k - 2} \right)$

only includes the perceptually encoded signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right),\,\, i = 1,\,\,...,\,\, 0_{\text{MIN}}$

, and the corresponding coded gain control side information, whichconsists of the exponents e_(i)(k - 2) and the exception flags β_(i)(k -2), i = 1, ..., O _(MIN). The remaining perceptually encoded signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right),\,\, i = 0_{\text{MIN}} + 1,\,\,...,\,\, 0_{\text{MIN}}$

and the encoded remaining side information are included into theenhancement layer bit stream. In one embodiment, the base layer bitstream

${\overset{\smile}{\text{B}}}_{BASE}\left( {k - 2} \right)$

and the enhancement layer bit stream

${\overset{\smile}{\text{B}}}_{\text{ENH}}\left( {k - 2} \right)$

are then jointly transmitted instead of the former total bit stream

$\overset{\smile}{\text{B}}\left( {k - 2} \right)$

.

A non-transitory computer readable storage medium having executableinstructions to cause a computer to perform a method for compressing aHigher Order Ambisonics (HOA) signal representation having time framesof HOA coefficient sequences is disclosed as described herein.

A non-transitory computer readable storage medium having executableinstructions to cause a computer to perform a method for decompressing aHigher Order Ambisonics (HOA) signal representation having time framesof HOA coefficient sequences is disclosed as described herein.

Methods and apparatus for decoding a compressed Higher Order Ambisonics(HOA) representation of a sound or soundfield. The method may includereceiving a bit stream containing the compressed HOA representation anddecoding, based on a determination that there are multiple layers, thecompressed HOA representation from the bitstream to obtain a sequence ofdecoded HOA representations. A first subset of the sequence of decodedHOA representations is determined based only on corresponding ambientHOA components. A second subset of the sequence of decoded HOArepresentations is determined based on corresponding ambient HOAcomponents and corresponding predominant sound components. For a framek, the sequence of decoded HOA representations are represented at leastin part by

${\widetilde{\hat{c}}}_{n}\left( {k - 1} \right) = \left\{ \begin{array}{ll}{{\hat{\text{c}}}_{\text{AMB,}n}\left( {k - 1} \right)} & \text{for n in the first subset} \\{{\hat{\text{c}}}_{n}\left( {k - 1} \right) = {\hat{\text{c}}}_{\text{PS,}n}\left( {k - 1} \right) + {\hat{\text{c}}}_{\text{AMB,}n}\left( {k - 1} \right),} & \text{for n in the second subset}\end{array} \right)$

where

${\overset{\smile}{\text{c}}}_{\text{AMB,}n}\left( {k - 1} \right)$

corresponds to the corresponding ambient HOA components and

${\hat{\text{c}}}_{\text{PS},n}\left( {k - 1} \right)$

corresponds to the corresponding predominant sound components.

An indication of the multiple layers is signalled in the bitstream. Themultiple layers include a base layer and at least an enhancement layerthat are independently decodable of one another. The first subset isdetermined based on 1 ≤ n ≤ O _(MIN) and the second set subset isdetermined based on O _(MIN) + 1 ≤ m ≤ O, wherein O indicates a totalnumber of channels and O _(MIN) indicates a number between 1 and O.

Advantageous embodiments of the invention are disclosed in the dependentclaims, the following description and the figures.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the invention are described with reference tothe accompanying drawings as follows:

FIGS. 1A and 1B illustrate an exemplary structure of a conventionalarchitecture of a HOA compressor;

FIGS. 2A and 2B illustrate an exemplary structure of a conventionalarchitecture of a HOA decompressor;

FIG. 3 illustrates an exemplary structure of an architecture of aspatial HOA encoding and perceptual encoding portion of a HOA compressoraccording to one embodiment of the invention;

FIG. 4 illustrates an exemplary structure of an architecture of a sourcecoder portion of a HOA compressor according to one embodiment of theinvention;

FIG. 5 illustrates an exemplary structure of an architecture of aperceptual decoding and source decoding portion of a HOA decompressoraccording to one embodiment of the invention;

FIG. 6 illustrates an exemplary structure of an architecture of aspatial HOA decoding portion of a HOA decompressor according to oneembodiment of the invention;

FIG. 7 illustrates an exemplary transformation of frames from ambientHOA signals to modified ambient HOA signals;

FIG. 8 illustrates a flow-chart of a method for compressing a HOAsignal;

FIG. 9 illustrates a flow-chart of a method for decompressing acompressed HOA signal; and

FIG. 10 details of parts of an exemplary architecture of a spatial HOAdecoding portion of a HOA decompressor according to one embodiment ofthe invention.

DETAILED DESCRIPTION OF THE INVENTION

For easier understanding, prior art solutions in FIGS. 1A, 1B and FIGS.2A and 2B are recapitulated in the following.

FIGS. 1A and 1B show the structure of a conventional architecture of aHOA compressor. In a method described in [4], the directional componentis extended to a so-called predominant sound component. As thedirectional component, the predominant sound component is assumed to bepartly represented by directional signals, meaning monaural signals witha corresponding direction from which they are assumed to impinge on thelistener, together with some prediction parameters to predict portionsof the original HOA representation from the directional signals.Additionally, the predominant sound component is supposed to berepresented by so-called vector based signals, meaning monaural signalswith a corresponding vector which defines the directional distributionof the vector based signals. The overall architecture of the HOAcompressor proposed in [4] is illustrated in FIGS. 1A and B. It can besubdivided into a spatial HOA encoding part depicted in FIG. 1A and aperceptual and source encoding part depicted in FIG. 1B. The spatial HOAencoder provides a first compressed HOA representation consisting of Isignals together with side information describing how to create an HOArepresentation thereof. In the perceptual and side info source coder thementioned I signals are perceptually encoded and the side information issubjected to source encoding, before multiplexing the two codedrepresentations.

Conventionally, the spatial encoding works as follows.

In a first step, the k-th frame C(k) of the original HOA representationis input to a Direction and Vector Estimation processing block, whichprovides the tuple sets M_(DIR)(k) and M_(VEC)(k). The tuple setM_(DIR)(k) consists of tuples of which the first element denotes theindex of a directional signal and of which the second element denotesthe respective quantized direction. The tuple set M_(VEC)(k) consists oftuples of which the first element indicates the index of a vector basedsignal and of which the second element denotes the vector defining thedirectional distribution of the signals, i.e. how the HOA representationof the vector based signal is computed.

Using both tuple sets M_(DIR)(k) and M_(VEC)(k), the initial HOA frameC(k) is decomposed in the HOA Decomposition into the frame X_(PS)(k - 1)of all predominant sound (i.e. directional and vector based) signals andthe frame C_(AMB)(k - 1) of the ambient HOA component. Note the delay102 of one frame, respectively, which is due to overlap add processingin order to avoid blocking artifacts. Furthermore, the HOA Decompositionis assumed to output some prediction parameters ζ(k - 1) describing howto predict portions of the original HOA representation from thedirectional signals in order to enrich the predominant sound HOAcomponent. Additionally, a target assignment vector v_(A,T)(k - 1)containing information about the assignment of predominant soundsignals, which were determined in the HOA Decomposition processingblock, to the I available channels is provided. The affected channelscan be assumed to be occupied, meaning they are not available totransport any coefficient sequences of the ambient HOA component in therespective time frame.

In the Ambient Component Modification processing block, the frameC_(AMB)(k - 1) of the ambient HOA component is modified according to theinformation provided by the tagret assignment vector v_(A),_(T)(k - 1).In particular, it is determined which coefficient sequences of theambient HOA component are to be transmitted in the given I channels,depending, amongst other aspects, on the information (contained in thetarget assignment vector v_(A,T)(k - 1)) about which channels areavailable and not already occupied by predominant sound signals.Additionally, a fade in and out of coefficient sequences is performed ifthe indices of the chosen coefficient sequences vary between successiveframes.

Furthermore, it is assumed that the first O _(MIN) coefficient sequencesof the ambient HOA component C_(AMB)(k - 2) are always chosen to beperceptually coded and to be transmitted, where O _(MIN) = (N_(MIN) +1)² with N_(MIN) ≤ N being typically a smaller order than that of theoriginal HOA representation. In order to de-correlate these HOAcoefficient sequences, it is proposed to transform them to directionalsignals (i.e. general plane wave functions) impinging from somepredefined directions Ω_(MIN,d), d = 1, ..., O _(MIN). Along with themodified ambient HOA component C_(M),_(A)(k - 1), a temporally predictedmodified ambient HOA component C_(P,) _(M,) _(A)(k - 1) is computed tobe later used in the Gain Control processing block in order to allow areasonable look ahead.

The information about the modification of the ambient HOA component isdirectly related to the assignment of all possible types of signals tothe available channels. The final information about the assignment iscontained in the final assignment vector v_(A)(k - 2). In order tocompute this vector, information contained in the target assignmentvector v_(A,) _(T)(k - 1) is exploited.

The Channel Assignment assigns with the information provided by theassignment vector v_(A)(k - 2) the appropriate signals contained inX_(PS)(k - 2) and that contained in C_(M,) _(A)(k - 2) to the Iavailable channels, yielding the signals y_(i)(k - 2), i = 1, ...,I.Further, appropriate signals contained in X_(PS)(k - 1) and that inC_(P,AMB)(k - 1) are also assigned to the I available channels, yieldingthe predicted signals y_(P,) _(i)(k - 2), i = 1, ...,I. Each of thesignals y_(i)(k - 2), i = 1, ...,I, is finally processed by a GainControl, where the signal gain is smoothly modified to achieve a valuerange that is suitable for the perceptual encoders. The predicted signalframes y_(P,) _(i)(k - 2), i = 1, ...,I, allow a kind of look ahead inorder to avoid severe gain changes between successive blocks. The gainmodifications are assumed to be reverted in the spatial decoder with thegain control side information, consisting of the exponents e_(i)(k - 2)and the exception flags β_(i)(k - 2), i = 1, ...,I.

FIGS. 2A and 2B show the structure of a conventional architecture of aHOA decompressor, as proposed in [4]. Conventionally, HOA decompressionconsists of the counterparts of the HOA compressor components, which areobviously arranged in reverse order. It can be subdivided into aperceptual and source decoding part depicted in FIG. 2A and a spatialHOA decoding part depicted in FIG. 2B.

In the perceptual and side info source decoder, the bit stream is firstde-multiplexed into the perceptually coded representation of the Isignals and into the coded side information describing how to create anHOA representation thereof. Successively, a perceptual decoding of the Isignals and a decoding of the side information is performed. Then, thespatial HOA decoder creates from the I signals and the side informationthe reconstructed HOA representation.

Conventionally, spatial HOA decoding works as follows.

In the spatial HOA decoder, each of the perceptually decoded signals

${\hat{\text{z}}}_{i}(k),i \in \left\{ {1,\,...,\, I} \right\}$

, is first input to an Inverse Gain Control processing block togetherwith the associated gain correction exponent e_(i)(k) and gaincorrection exception flag β_(i)(k). The i-th Inverse Gain Controlprocessing provides a gain corrected signal frame

${\overset{\frown}{\text{y}}}_{i}(k)$

.

All of the I gain corrected signal frames

$\overset{\frown}{\text{y}}(k),\,\, i \in \left\{ {1,\,\,...,\,\, I} \right\}$

, are passed together with the assignment vector v_(AMB,ASSIGN)(k) andthe tuple sets M_(DIR)(k + 1) and M_(VEC)(k + 1) to the ChannelReassignment. The tuple sets M_(DIR)(k + 1) and M_(VEC)(k + 1) aredefined above (for spatial HOA encoding), and the assignment vectorv_(AMB,ASSIGN)(k) consists of I components, which indicate for eachtransmission channel if and which coefficient sequence of the ambientHOA component it contains. In the Channel Reassignment the gaincorrected signal frames

${\overset{\frown}{\text{y}}}_{i}(k)$

are redistributed to reconstruct the frame

${\overset{\frown}{\text{X}}}_{\text{PS}}(k)$

of all predominant sound signals (i.e., all directional and vector basedsignals) and the frame C_(I,AMB)(k) of an intermediate representation ofthe ambient HOA component. Additionally, the set I_(AMB,ACT)(k) ofindices of coefficient sequences of the ambient HOA component, which areactive in the k-th frame, and the sets

J_(E)(k − 1),  J_(D)(k − 1),  andJ_(U)(k − 1)

of coefficient indices of the ambient HOA component, which have to beenabled, disabled and to remain active in the (k - 1)-th frame, areprovided.

In the Predominant Sound Synthesis the HOA representation of thepredominant sound component

${\overset{\frown}{\text{C}}}_{\text{PS}}\left( {k - 1} \right)$

is computed from the frame

${\overset{\frown}{\text{X}}}_{\text{PS}}(k)$

_(PS)(k) of all predominant sound signals using the tuple setM_(DIR)(k + 1) and the set ζ(k + 1) of prediction parameters, the tupleset M_(VEC)(k + 1) and the sets

J_(E)(k − 1),  J_(D)(k − 1),  andJ_(U)(k − 1)

.

In the Ambience Synthesis, the ambient HOA component frame

${\overset{\frown}{\text{C}}}_{\text{AMB}}\left( {k - 1} \right)$

is created from the frame C_(I,AMB)(k) of the intermediaterepresentation of the ambient HOA component, using the set

J_(AMB,  ACT)(k)

of indices of coefficient sequences of the ambient HOA component whichare active in the k-th frame. Note the delay of one frame, which isintroduced due to the synchronization with the predominant sound HOAcomponent.

Finally, in the HOA Composition the ambient HOA component frame

${\overset{\frown}{\text{C}}}_{\text{AMB}}\left( {k - 1} \right)$

and the frame

${\overset{\frown}{C}}_{\text{PS}}\left( {k - 1} \right)$

of the predominant sound HOA component are superposed to provide thedecoded HOA frame

$\overset{\frown}{\text{C}}\left( {k - 1} \right)$

.

As has become clear from the coarse description of the HOA compressionand decompression method above, the compressed representation consistsof I quantized monaural signals and some additional side information. Afixed number O _(MIN) out of these I quantized monaural signalsrepresent a spatially transformed version of the first O _(MIN)coefficient sequences of the ambient HOA component C_(AMB)(k - 2). Thetype of the remaining I - O _(MIN) signals can vary between successiveframe, being either directional, vector based, empty or representing anadditional coefficient sequence of the ambient HOA component C_(AMB)(k -2). Taken as it is, the compressed HOA representation is meant to bemonolithic. In particular, one problem is how to split the describedrepresentation into a low quality base layer and an enhancement layer.

According to the disclosed invention, a candidate for a low quality baselayer are the O _(MIN) channels that contain a spatially transformedversion of the first O _(MIN) coefficient sequences of the ambient HOAcomponent C_(AMB)(k - 2). What makes these (without loss of generality:first) O _(MIN) channels a good choice to form a low quality base layeris their time-invariant type. However, the respective signals lack anypredominant sound components, which are essential for the sound scene.This can also be seen in the computation of the ambient HOA componentC_(AMB)(k - 1), which is carried out by subtraction of the predominantsound HOA representation C_(PS)(k - 1) from the original HOArepresentation C(k - 1) according to

C_(AMB)(k − 1) = C(k − 1) − C_(PS)(k − 1)

A solution to this problem is to include the predominant soundcomponents at a low spatial resolution into the base layer.

Proposed amendments to the HOA compression are described in thefollowing.

FIG. 3 shows the structure of an architecture of a spatial HOA encodingand perceptual encoding portion of a HOA compressor according to oneembodiment of the invention.

To include also the predominant sound components at a low spatialresolution into the base layer, the ambient HOA component C_(AMB)(k -1), which is output by the HOA Decomposition processing in the spatialHOA encoder (see FIG. 1A), is replaced by a modified version

${\widetilde{\text{C}}}_{\text{AMB}}\left( {k - 1} \right) = \begin{bmatrix}{{\widetilde{\text{c}}}_{\text{AMB,1}}\left( {k - 1} \right)} \\{{\widetilde{\text{c}}}_{\text{AMB,2}}\left( {k - 1} \right)} \\ \vdots \\{{\widetilde{\text{c}}}_{\text{AMB,}O}\left( {k - 1} \right)}\end{bmatrix}$

whose elements are given by

${\widetilde{\text{c}}}_{\text{AMB,}n}\left( {k - 1} \right) = \left\{ \begin{array}{ll}{\text{c}_{n}\left( {k - 1} \right)} & {\text{for 1} \leq n \leq O_{\text{MIN}}} \\{\text{c}_{\text{AMB,}n}\left( {k - 1} \right)} & {\text{for}O_{\text{MIN}} + 1 \leq n \leq O}\end{array} \right)$

In other words, the first O _(MIN) coefficient sequences of the ambientHOA component which are supposed to be always transmitted in a spatiallytransformed form, are replaced by the coefficient sequences of theoriginal HOA component. The other processing blocks of the spatial HOAencoder can remain unchanged.

It is important to note that this change of the HOA Decompositionprocessing can be seen as an initial operation making the HOAcompression work in a so-called “dual layer” or “two layer” mode. Thismode provides a bit stream that can be split up into a low quality BaseLayer and an Enhancement Layer. Using or not this mode can be signalizedby a single bit in access units of the total bit stream.

A possible consequent modification of the bit stream multiplexing toprovide bit streams for a base layer and an enhancement layer isillustrated in FIGS. 3 and 4 , as described further below.

The base layer bit stream

${\overset{\smile}{\text{B}}}_{\text{BASE}}\left( {k - 2} \right)$

only includes the perceptually encoded signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right),\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$

, and the corresponding coded gain control side information, consistingof the exponents e_(i)(k - 2) and the exception flags β_(i)(k - 2), i =1, ..., O _(MIN). The remaining perceptually encoded signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right),\,\, i = O_{\text{MIN}} + 1,\,\,...,\,\, O$

and the encoded remaining side information are included into theenhancement layer bit stream. The base layer and enhancement layer bitstreams

${\overset{\smile}{\text{B}}}_{\text{BASE}}\left( {k - 2} \right)\,\,\text{and}\,{\overset{\smile}{\text{B}}}_{\text{ENH}}\left( {k - 2} \right)$

are then jointly transmitted instead of the former total bit stream

$\overset{\smile}{\text{B}}\left( {k - 2} \right)$

.

In FIG. 3 and FIG. 4 , an apparatus for compressing a HOA signal beingan input HOA representation with input time frames (C(k)) of HOAcoefficient sequences is shown. Said apparatus comprises a spatial HOAencoding and perceptual encoding portion for spatial HOA encoding of theinput time frames and subsequent perceptual encoding, which is shown inFIG. 3 , and a source coder portion for source encoding, which is shownin FIG. 4 .

The spatial HOA encoding and perceptual encoding portion 300 comprises aDirection and Vector Estimation block 301, delay 302, a HOADecomposition block 303, an Ambient Component Modification block 304, aChannel Assignment block 305, and a plurality of Gain Control blocks306.

The Direction and Vector Estimation block 301 is adapted for performingDirection and Vector Estimation processing of the HOA signal, whereindata comprising first tuple sets M_(DIR)(k) for directional signals andsecond tuple sets M_(VEC)(k) for vector based signals are obtained, eachof the first tuple sets M_(DIR)(k) comprising an index of a directionalsignal and a respective quantized direction, and each of the secondtuple sets M_(VEC)(k) comprising an index of a vector based signal and avector defining the directional distribution of the signals.

The HOA Decomposition block 303 is adapted for decomposing each inputtime frame of the HOA coefficient sequences into a frame of a pluralityof predominant sound signals X_(PS)(k-1) and a frame of an ambient HOAcomponent

${\widetilde{\text{C}}}_{\text{AMB}}\left( {k - 1} \right)$

, wherein the predominant sound signals X_(PS)(k-1) comprise saiddirectional sound signals and said vector based sound signals, andwherein the ambient HOA component

${\widetilde{\text{C}}}_{\text{AMB}}\left( {k - 1} \right)$

comprises HOA coefficient sequences representing a residual between theinput HOA representation and the HOA representation of the predominantsound signals, and wherein the decomposing further provides predictionparameters ξ(k-1) and a target assignment vector v_(A,) _(T)(k - 1). Theprediction parameters ξ(k-1) describe how to predict portions of the HOAsignal representation from the directional signals within thepredominant sound signals X_(PS) (k-1) so as to enrich predominant soundHOA components, and the target assignment vector v_(A,) _(T)(k - 1)contains information about how to assign the predominant sound signalsto a given number I of channels.

The Ambient Component Modification block 304 is adapted for modifyingthe ambient HOA component C_(AMB)(k - 1) according to the informationprovided by the target assignment vector v_(A,) _(T)(k - 1), wherein itis determined which coefficient sequences of the ambient HOA componentC_(AMB)(k - 1) are to be transmitted in the given number I of channels,depending on how many channels are occupied by predominant soundsignals, and wherein a modified ambient HOA component C_(M,) _(A)(k - 2)and a temporally predicted modified ambient HOA component C_(P,) _(M,)_(A)(k - 1) are obtained, and wherein a final assignment vectorv_(A)(k - 2) is obtained from information in the target assignmentvector v_(A,) _(T)(k - 1).

The Channel Assignment block 305 is adapted for assigning thepredominant sound signals X_(PS)(k-1) obtained from the decomposing, thedetermined coefficient sequences of the modified ambient HOA componentC_(M,) _(A)(k - 2) and of the temporally predicted modified ambient HOAcomponent C_(P,) _(M,) _(A)(k - 1) to the given number I of channelsusing the information provided by the final assignment vector v_(A)(k -2), wherein transport signals y_(i)(k - 2), i = 1, ...,I and predictedtransport signals y_(P,) _(i)(k - 2), i = 1, ...,I are obtained.

The plurality of Gain Control blocks 306 is adapted for performing gaincontrol (805) to the transport signals y_(i)(k - 2) and the predictedtransport signals y_(P,) _(i)(k - 2), wherein gain modified transportsignals z_(i)(k - 2), exponents e_(i)(k - 2) and exception flagsβ_(i)(k - 2) are obtained.

FIG. 4 shows the structure of an architecture of a source coder portionof a HOA compressor according to one embodiment of the invention. Thesource coder portion as shown in FIG. 4 comprises a Perceptual Coder310, a Side Information Source Coder block with two coders 320,330,namely a Base Layer Side Information Source Coder 320 and an EnhancementLayer Side Information Encoder 330, and two multiplexers 340,350, namelya Base Layer Bitstream Multiplexer 340 and an Enhancement LayerBitstream Multiplexer 350. The Side Information Source Coders may be ina single Side Information Source Coder block.

The Perceptual Coder 310 is adapted for perceptually coding 806 saidgain modified transport signals z_(i)(k - 2), wherein perceptuallyencoded transport signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right),\,\, i = 1,\,\,...,\,\, I$

are obtained.

The Side Information Source Coders 320,330 are adapted for encoding sideinformation comprising said exponents e_(i)(k - 2) and exception flagsβ_(i)(k - 2), said first tuple sets M_(DIR)(k) and second tuple setsM_(VEC)(k), said prediction parameters ξ(k-1) and said final assignmentvector v_(A)(k - 2), wherein encoded side information

$\overset{\smile}{\text{Γ}}\left( {k - 2} \right)$

is obtained.

The multiplexers 340,350 are adapted for multiplexing the perceptuallyencoded transport signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right)$

and the encoded side information

$\overset{\smile}{\text{Γ}}\left( {k - 2} \right)$

into a multiplexed data stream

, wherein the ambient HOA component

${\widetilde{\text{C}}}_{\text{AMB}}\left( {k - 1} \right)$

obtained in the decomposing comprises first HOA coefficient sequences ofthe input HOA representation c_(n)(k - 1) in O_(MIN) lowest positions(ie. those with lowest indices) and second HOA coefficient sequencesc_(AMB,n)(k - 1) in remaining higher positions. As explained below withrespect to eq.(4)-(6), the second HOA coefficient sequences are part ofan HOA representation of a residual between the input HOA representationand the HOA representation of the predominant sound signals. Further,the first O _(MIN) exponents e_(i)(k - 2), i = 1, ..., O _(MIN) andexception flags β_(i)(k - 2), i = 1, ..., O _(MIN) are encoded in a BaseLayer Side Information Source Coder 320, wherein encoded Base Layer sideinformation

${\overset{\smile}{\text{Γ}}}_{\text{BASE}}\left( {k - 2} \right)$

is obtained, and wherein O _(MIN) = (N_(MIN) + 1)² and O=(N+1)², withN_(MIN) ≤ N and O _(MIN) ≤ I and N_(MIN) is a predefined integer value.The first O _(MIN) perceptually encoded transport signals

${\overset{\smile}{\text{z}}}_{i}\left( {k - 2} \right),\,\, i = 1,\,\,...,\,\, O_{MIN}$

and the encoded Base Layer side information

${\overset{\smile}{\text{Γ}}}_{\text{BASE}}\left( {k - 2} \right)$

are multiplexed in a Base Layer Bitstream Multiplexer 340 (which is oneof said multiplexers), wherein a Base Layer bitstream

${\overset{\smile}{\text{B}}}_{BASE}\left( {k - 2} \right)$

is obtained. The Base Layer Side Information Source Coder 320 is one ofthe Side Information Source Coders, or it is within a Side InformationSource Coder block.

The remaining I - O _(MIN) exponents e_(i)(k - 2), i = O _(MIN) + 1,..., I and exception flags β_(i)(k - 2), i = O _(MIN) + 1, ...,I, saidfirst tuple sets M_(DIR)(k - 1) and second tuple sets M_(VEC)(k - 1),said prediction parameters ξ(k-1) and said final assignment vectorv_(A)(k - 2) are encoded in an Enhancement Layer Side InformationEncoder 330, wherein encoded enhancement layer side information

${\overset{\smile}{\text{Γ}}}_{ENH}\left( {k - 2} \right)$

is obtained. The Enhancement Layer Side Information Source Coder 330 isone of the Side Information Source Coders, or is within a SideInformation Source Coder block.

The remaining I - O _(MIN) perceptually encoded transport signals

${\overset{\smile}{z}}_{i}\left( {k - 2} \right),\,\, i = O_{MIN} + 1,\,\,...,\,\, I$

and the encoded enhancement layer side information

${\overset{\smile}{\text{Γ}}}_{ENH}\left( {k - 2} \right)$

are multiplexed in an Enhancement Layer Bitstream Multiplexer 350 (whichis also one of said multiplexers), wherein an Enhancement Layerbitstream

${\overset{\smile}{B}}_{ENH}\left( {k - 2} \right)$

is obtained. Further, a mode indication LMF_(E) is added in amultiplexer or an indication insertion block. The mode indicationLMF_(E) signalizes usage of a layered mode, which is used for correctdecompression of the compressed signal.

In one embodiment, the apparatus for encoding further comprises a modeselector adapted for selecting a mode, the mode being indicated by themode indication LMF_(E) and being one of a layered mode and anon-layered mode. In the non-layered mode, the ambient HOA component

C̃_(AMB)(k − 1)

comprises only HOA coefficient sequences representing a residual betweenthe input HOA representation and the HOA representation of thepredominant sound signals (ie., no coefficient sequences of the inputHOA representation).

Proposed amendments of the HOA decompression are described in thefollowing.

In the layered mode, the modification of the ambient HOA componentC_(AMB)(k - 1) in the HOA compression is considered at the HOAdecompression by appropriately modifying the HOA composition.

In the HOA decompressor, the demultiplexing and decoding of the baselayer and enhancement layer bit streams are performed according to FIG.5 . The base layer bit stream

${\overset{\smile}{B}}_{BASE}(k)$

is de-multiplexed into the coded representation of the base layer sideinformation and the perceptually encoded signals. Subsequently, thecoded representation of the base layer side information and theperceptually encoded signals are decoded to provide the exponents e_(i)(k) and the exception flags on the one hand, and the perceptuallydecoded signals on the other hand. Similarly, the enhancement layer bitstream is de-multiplexed and decoded to provide the perceptually decodedsignals and the remaining side information (see FIG. 5 ). With thislayered mode, the spatial HOA decoding part also has to be modified toconsider the modification of the ambient HOA component C_(AMB) (k - 1)in the spatial HOA encoding. The modification is accomplished in the HOAcomposition.

In particular, the reconstructed HOA representation

Ĉ(k − 1) = Ĉ_(PS)(k − 1) + Ĉ_(AMB)(k − 1)

is replaced by its modified version

$\widetilde{\hat{C}}\left( {k - 1} \right) = \begin{bmatrix}{{\widetilde{\hat{\text{c}}}}_{1}\left( {k - 1} \right)} \\{{\widetilde{\hat{\text{c}}}}_{2}\left( {k - 1} \right)} \\ \vdots \\{{\widetilde{\hat{\text{c}}}}_{O}\left( {k - 1} \right)}\end{bmatrix}$

whose elements are given by

${\widetilde{\hat{\text{c}}}}_{n}\left( {k - 1} \right) = \left\{ \begin{array}{ll}{{\hat{\text{c}}}_{\text{AMB,}n}\left( {k - 1} \right)} & {\text{for 1} \leq n \leq O_{\text{MIN}}} \\{{\hat{\text{c}}}_{n}\left( {k - 1} \right)} & {\text{for}O_{\text{MIN}} + 1 \leq n \leq O}\end{array} \right)$

That means that the predominant sound HOA component is not added to theambient HOA component for the first O _(MIN) coefficient sequences,since it is already included therein. All other processing blocks of theHOA spatial decoder remain unchanged.

In the following, the HOA decompression in the pure presence of a lowquality base layer bit stream

${\overset{\smile}{B}}_{\text{BASE}}(k)$

is briefly considered.

The bit stream is first de-multiplexed and decoded to provide thereconstructed signals

ẑ_(i)(k)

and the corresponding gain control side information, consisting of theexponents e_(i)(k) and the exception flags β_(i)(k), i = 1, ..., O_(MIN). Note that in absence of the enhancement layer, the perceptuallycoded signals

${\overset{\smile}{z}}_{i}\left( {k - 2} \right),\,\, i = O_{\text{MIN}} + 1,\,\,...,\,\, O$

, are not available. A possible way of addressing this situation is toset the signals

ẑ_(i)(k),  i = O_(MIN) + 1,  ...,  O

, to zero, which automatically causes the reconstructed predominantsound component C_(PS)(k - 1) to be zero.

In a next step, in the spatial HOA decoder, the first O _(MIN) InverseGain Control processing blocks provide gain corrected signal frames

ŷ_(i)(k),  i = 1,  ...,  O_(MIN)

, which are used to construct the frame C_(I,AMB)(k) of an intermediaterepresentation of the ambient HOA component by the Channel Reassignment.Note that the set

J_(AMB,  ACT)(k)

of indices of coefficient sequences of the ambient HOA component, whichare active in the k-th frame, contains only the indices 1,2, ..., O_(MIN). In the Ambience Synthesis, the spatial transform of the first O_(MIN) coefficient sequences is reverted to provide the ambient HOAcomponent frame C_(AMB)(k - 1). Finally, the reconstructed HOArepresentation is computed according to eq.(6).

FIG. 5 and FIG. 6 show the structure of an architecture of a HOAdecompressor according to one embodiment of the invention. The apparatuscomprises a perceptual decoding and source decoding portion as shown inFIG. 5 , a spatial HOA decoding portion as shown in FIG. 6 , and a modedetector adapted for detecting a layered mode indication LMF_(D)indicating that the compressed HOA signal comprises a compressed baselayer bitstream

${\overset{\smile}{B}}_{BASE}(k)$

and a compressed enhancement layer bitstream.

FIG. 5 shows the structure of an architecture of a perceptual decodingand source decoding portion of a HOA decompressor according to oneembodiment of the invention. The perceptual decoding and source decodingportion comprises a first demultiplexer 510, a second demultiplexer 520,a Base Layer Perceptual Decoder 540 and an Enhancement Layer PerceptualDecoder 550, a Base Layer Side Information Source Decoder 530 and anEnhancement Layer Side Information Source Decoder 560.

The first demultiplexer 510 is adapted for demultiplexing the compressedbase layer bitstream

${\overset{\smile}{B}}_{BASE}(k)$

, wherein first perceptually encoded transport signals

${\overset{\smile}{z}}_{i}(k),\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$

and first encoded side information

${\overset{\smile}{\text{Γ}}}_{BASE}(k)$

are obtained. The second demultiplexer 520 is adapted for demultiplexingthe compressed enhancement layer bitstream

${\overset{\smile}{B}}_{ENH}(k)$

, wherein second perceptually encoded transport signals

${\overset{\smile}{z}}_{i}(k),\,\, i = O_{\text{MIN}} + 1,\,\,...,\,\, I$

and second encoded side information

${\overset{\smile}{\text{Γ}}}_{ENH}(k)$

are obtained.

The Base Layer Perceptual Decoder 540 and the Enhancement LayerPerceptual Decoder 550 are adapted for perceptually decoding 904 theperceptually encoded transport signals

${\overset{\smile}{z}}_{i}(k),\,\, i = 1,\,\,...,I$

, wherein perceptually decoded transport signals

ẑ_(i)(k)

are obtained, and wherein in the Base Layer Perceptual Decoder 540 saidfirst perceptually encoded transport signals

${\overset{\smile}{z}}_{i}(k),\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$

of the base layer are decoded and first perceptually decoded transportsignals

ẑ_(i)(k),  i = 1,  ...,  O_(MIN)

are obtained. In the Enhancement Layer Perceptual Decoder 550, saidsecond perceptually encoded transport signals

${\overset{\smile}{z}}_{i}(k),\,\, i = O_{\text{MIN}} + 1,\,\,...,\,\, I$

of the enhancement layer are decoded and second perceptually decodedtransport signals

ẑ_(i)(k),  i= O_(MIN) + 1,  ...,  I

are obtained.

The Base Layer Side Information Source Decoder 530 is adapted fordecoding 905 the first encoded side information

${\overset{\smile}{\text{Γ}}}_{BASE}(k)$

, wherein first exponents e_(i)(k), i = 1, ..., O _(MIN) and firstexception flags β_(i)(k), i = 1, ..., O _(MIN) are obtained.

The Enhancement Layer Side Information Source Decoder 560 is adapted fordecoding 906 the second encoded side information

${\overset{\smile}{\text{Γ}}}_{ENH}(k)$

, wherein second exponents e_(i)(k), i = O _(MIN) + 1, ..., I and secondexception flags β_(i)(k), i = O _(MIN) + 1, ..., I are obtained, andwherein further data are obtained. The further data comprise a firsttuple set M_(DIR)(k + 1) for directional signals and a second tuple setM_(VEC)(k + 1) for vector based signals. Each tuple of the first tupleset M_(DIR)(k + 1) comprises an index of a directional signal and arespective quantized direction, and each tuple of the second tuple setM_(VEC)(k + 1) comprises an index of a vector based signal and a vectordefining the directional distribution of the vector based signal.Further, prediction parameters ξ(k+1) and an ambient assignment vectorν_(AMB,ASSIGN)(k) are obtained, wherein the ambient assignment vectorν_(AMB,ASSIGN)(k) comprises components that indicate for eachtransmission channel if and which coefficient sequence of the ambientHOA component it contains.

FIG. 6 shows the structure of an architecture of a spatial HOA decodingportion of a HOA decompressor according to one embodiment of theinvention. The spatial HOA decoding portion comprises a plurality ofinverse gain control units 604, a Channel Reassignment block 605, aPredominant Sound Synthesis block 606, and an Ambient Synthesis block607, a HOA Composition block 608.

The plurality of inverse gain control units 604 are adapted forperforming inverse gain control, wherein said first perceptually decodedtransport signals

${\overset{\frown}{z}}_{i}(k),\,\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$

are transformed into first gain corrected signal frames

${\overset{\frown}{y}}_{i}(k),\,\,\, i = 1,\,\,\,...,\,\,\, O_{\text{MIN}}$

according to the first exponents e_(i)(k), i = 1, ..., O _(MIN) and thefirst exception flags β_(i)(k), i = 1, ..., O _(MIN) , and wherein thesecond perceptually decoded transport signals

${\overset{\frown}{z}}_{i}(k),\,\, i = O_{MIN} + 1,\,\,...,\,\, I$

are transformed into second gain corrected signal frames

${\overset{\frown}{y}}_{i}(k),\,\, i = O_{\text{MIN}} + 1,\,\,...,\,\, I$

according to the second exponents e_(i)(k), i = O _(MIN) + 1, ..., I andthe second exception flags β_(i)(k), i = O _(MIN) + 1, ..., I.

The Channel Reassignment block 605 is adapted for redistributing 911 thefirst and second gain corrected signal frames

${\overset{\frown}{y}}_{i}(k),\,\,\, i = 1,\,\,...,\,\, I$

to l channels, wherein frames of predominant sound signals

${\overset{\frown}{X}}_{PS}(k)$

are reconstructed, the predominant sound signals comprising directionalsignals and vector based signals, and wherein a modified ambient HOAcomponent

C̃_(I, AMB)(k)

is obtained, and wherein the assigning is made according to said ambientassignment vector ν_(AMB,ASSIGN)(k) and to information in said first andsecond tuple sets M_(DIR)(k + 1), M_(VEC)(k + 1). Further, the ChannelReassignment block 605 is adapted for generating a first set of indices

J_(AMB,ACT)(k)

of coefficient sequences of the modified ambient HOA component that areactive in a k^(th) frame, and a second set of indices

J_(E)(k − 1),  J_(D)(k − 1),  J_(U)(k − 1)

of coefficient sequences of the modified ambient HOA component that haveto be enabled, disabled and to remain active in the (k-1)^(th) frame.

The Predominant Sound Synthesis block 606 is adapted for synthesizing912 a HOA representation of the predominant HOA sound components

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$

from said predominant sound signals

${\overset{\frown}{X}}_{PS}(k)$

, wherein the first and second tuple sets M_(DIR)(k + 1), M_(VEC)(k +1), the prediction parameters ξ(k+1) and the second set of indices

J_(E)(k − 1),  J_(D)(k − 1),  J_(U)(k − 1)

are used.

The Ambient Synthesis block 607 is adapted for synthesizing 913 anambient HOA component

${\overset{\frown}{\widetilde{C}}}_{\text{AMB,ACT}}(k)$

from the modified ambient HOA component

C̃_(I,AMB)(k)

, wherein an inverse spatial transform for the first O_(MIN) channels ismade and wherein the first set of indices

J_(AMB,ACT)(k)

is used, the first set of indices being indices of coefficient sequencesof the ambient HOA component that are active in the k^(th) frame.

If the layered mode indication LMF_(D) indicates a layered mode with atleast two layers, the ambient HOA component comprises in its O_(MIN)lowest positions (ie. those with lowest indices) HOA coefficientsequences of the decompressed HOA signal

$\overset{\frown}{C}\left( {k - 1} \right)$

, and in remaining higher positions coefficient sequences that are partof an HOA representation of a residual. This residual is a residualbetween the decompressed HOA signal

$\overset{\frown}{C}\left( {k - 1} \right)$

and 914 the HOA representation of the predominant HOA sound components

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$

.

On the other hand, if the layered mode indication LMF_(D) indicates asingle-layer mode, there are no HOA coefficient sequences of thedecompressed HOA signal

$\overset{\frown}{C}\left( {k - 1} \right)$

comprised, and the ambient HOA component is a residual between thedecompressed HOA signal

$\overset{\frown}{C}\left( {k - 1} \right)$

and the HOA representation of the predominant sound components

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$

.

The HOA Composition block 608 is adapted for adding the HOArepresentation of the predominant sound components to the ambient HOAcomponent

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right){\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$

, wherein coefficients of the HOA representation of the predominantsound signals and corresponding coefficients of the ambient HOAcomponent are added, and wherein the decompressed HOA signal

$\overset{\frown}{C}'\left( {k - 1} \right)$

is obtained, and wherein,

if the layered mode indication LMF_(D) indicates a layered mode with atleast two layers, only the highest I-O_(MIN) coefficient channels areobtained by addition of the predominant HOA sound components

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$

and the ambient HOA component

${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$

, and the lowest O_(MIN) coefficient channels of the decompressed HOAsignal

$\overset{\frown}{C}'\left( {k - 1} \right)$

are copied from the ambient HOA component

${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$

. On the other hand, if the layered mode indication LMF_(D) indicates asingle-layer mode, all coefficient channels of the decompressed HOAsignal

$\overset{\frown}{C}'\left( {k - 1} \right)$

are obtained by addition of the predominant HOA sound components

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$

and the ambient HOA component

${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$

.

FIG. 7 shows transformation of frames from ambient HOA signals tomodified ambient HOA signals.

FIG. 8 shows a flow-chart of a method for compressing a HOA signal.

The method 800 for compressing a Higher Order Ambisonics (HOA) signalbeing an input HOA representation of an order N with input time framesC(k) of HOA coefficient sequences comprises spatial HOA encoding of theinput time frames and subsequent perceptual encoding and sourceencoding.

The spatial HOA encoding comprises steps of:

-   performing Direction and Vector Estimation processing 801 of the HOA    signal in a Direction and Vector Estimation block 301, wherein data    comprising first tuple sets M_(DIR)(k) for directional signals and    second tuple sets M_(VEC)(k) for vector based signals are obtained,    each of the first tuple sets M_(DIR)(k) comprising an index of a    directional signal and a respective quantized direction, and each of    the second tuple sets M_(VEC)(k) comprising an index of a vector    based signal and a vector defining the directional distribution of    the signals;-   decomposing 802 in a HOA Decomposition block 303 each input time    frame of the HOA coefficient sequences into a frame of a plurality    of predominant sound signals X_(PS) (k-1) and a frame of an ambient    HOA component-   C̃_(AMB)(k − 1)-   , wherein the predominant sound signals X_(PS) (k-1) comprise said    directional sound signals and said vector based sound signals, and    wherein the ambient HOA component-   C̃_(AMB)(k − 1)-   comprises HOA coefficient sequences representing a residual between    the input HOA representation and the HOA representation of the    predominant sound signals, and wherein the decomposing 702 further    provides prediction parameters ξ(k-1) and a target assignment vector    ν_(A,T)(k - 1), the prediction parameters ξ(k-1) describing how to    predict portions of the HOA signal representation from the    directional signals within the predominant sound signals X_(PS)    (k-1) so as to enrich predominant sound HOA components, and the    target assignment vector ν_(A,T)(k - 1) containing information about    how to assign the predominant sound signals to a given number l of    channels;-   modifying 803 in an Ambient Component Modification block 304 the    ambient HOA component C_(AMB)(k - 1) according to the information    provided by the target assignment vector ν_(A,T)(k - 1), wherein it    is determined which coefficient sequences of the ambient HOA    component C_(AMB)(k - 1) are to be transmitted in the given number l    of channels, depending on how many channels are occupied by    predominant sound signals, and wherein a modified ambient HOA    component C_(M,A)(k - 2) and a temporally predicted modified ambient    HOA component C_(P,M,A)(k - 1) are obtained, and wherein a final    assignment vector ν_(A)(k - 2) is obtained from information in the    target assignment vector ν_(A,T)(k - 1);-   assigning 804 in a Channel Assignment block 105 the predominant    sound signals X_(PS)(k-1) obtained from the decomposing, and the    determined coefficient sequences of the modified ambient HOA    component C_(M,A)(k - 2) and of the temporally predicted modified    ambient HOA component C_(P,M,A)(k - 1) to the given number l of    channels using the information provided by the final assignment    vector ν_(A)(k - 2), wherein transport signals y_(i)(k - 2), i = 1,    ..., I and predicted transport signals y_(P,i)(k - 2), i = 1, ..., I    are obtained, and performing gain control 805 to the transport    signals y_(i)(k - 2) and the predicted transport signals    y_(P,i)(k - 2) in a plurality of Gain Control blocks 306, wherein    gain modified transport signals z_(i)(k - 2), exponents e_(i)(k - 2)    and exception flags β_(i)(k - 2) are obtained.

The perceptual encoding and source encoding comprises steps of:

-   perceptually coding 806 in a Perceptual Coder 310 said gain modified    transport signals z_(i)(k - 2), wherein perceptually encoded    transport signals-   ${\overset{\smile}{z}}_{i}\left( {k - 2} \right),\,\, i = 1,\,\,...,\,\, I$-   are obtained;-   encoding 807 in one or more Side Information Source Coders 320,330    side information comprising said exponents e_(i)(k - 2) and    exception flags β_(i)(k - 2), said first tuple sets M_(DIR)(k) and    second tuple sets M_(VEC)(k), said prediction parameters ξ(k-1) and    said final assignment vector ν_(A)(k - 2), wherein encoded side    information-   $\overset{\smile}{\text{Γ}}\left( {k - 2} \right)$-   is obtained; and-   multiplexing 808 the perceptually encoded transport signals-   ${\overset{\smile}{z}}_{i}\left( {k - 2} \right)$-   and the encoded side information-   $\overset{\smile}{\text{Γ}}\left( {k - 2} \right)$-   , wherein a multiplexed data stream-   $\overset{\smile}{\overset{\smile}{B}}\left( {k - 2} \right)$-   is obtained.

The ambient HOA component

C̃_(AMB)(k − 1)

obtained in the decomposing step 802 comprises first HOA coefficientsequences of the input HOA representation c_(n)(k - 1) in O_(MIN) lowestpositions (ie. those with lowest indices) and second HOA coefficientsequences c_(AMB,n)(k - 1) in remaining higher positions. The secondcoefficient sequences are part of an HOA representation of a residualbetween the input HOA representation and the HOA representation of thepredominant sound signals.

The first O_(MIN) exponents e_(i)(k - 2), i = 1, ..., O _(MIN) andexception flags β_(i)(k - 2), i = 1, ..., O _(MIN) are encoded in a BaseLayer Side Information Source Coder 320, wherein encoded Base Layer sideinformation

${\overset{\smile}{\text{Γ}}}_{BASE}\left( {k - 2} \right)$

is obtained, and wherein O _(MIN) = (N_(MIN) + 1)² and O=(N+1)², withN_(MIN) ≤ N and O_(MIN) ≤ I and N_(MIN) is a predefined integer value.

The first O _(MIN) perceptually encoded transport signals

${\overset{\smile}{z}}_{i}\left( {k - 2} \right),\,\, i = 1,\,\,...,\,\, O_{MIN}$

and the encoded Base Layer side information

${\overset{\smile}{\text{Γ}}}_{BASE}\left( {k - 2} \right)$

are multiplexed 809 in a Base Layer Bitstream Multiplexer 340, wherein aBase Layer bitstream

${\overset{\smile}{B}}_{BASE}\left( {k - 2} \right)$

is obtained.

The remaining I - O _(MIN) exponents e_(i)(k - 2), i = O _(MIN) + 1,..., I and exception flags β_(i)(k - 2), i = O _(MIN) + 1, ..., I, saidfirst tuple sets M_(DIR)(k - 1) and second tuple sets M_(VEC)(k - 1),said prediction parameters ξ(k-1) and said final assignment vectorν_(A)(k - 2) (also shown as ν_(AMB,ASSIGN)(k) in the Figures) areencoded in an Enhancement Layer Side Information Encoder 330, whereinencoded enhancement layer side information

${\overset{\smile}{\text{Γ}}}_{ENH}\left( {k - 2} \right)$

is obtained.

The remaining I - O _(MIN) perceptually encoded transport signals

${\overset{\smile}{z}}_{i}\left( {k - 2} \right),\,\, i = O_{MIN} + 1,\,\,...,\,\, I$

and the encoded enhancement layer side information

${\overset{\smile}{\text{Γ}}}_{ENH}\left( {k - 2} \right)$

are multiplexed 810 in an Enhancement Layer Bitstream Multiplexer 350,wherein an Enhancement Layer bitstream

${\overset{\smile}{B}}_{ENH}\left( {k - 2} \right)$

is obtained.

A mode indication is added 811 that signalizes usage of a layered mode,as described above. The mode indication is added by an indicationinsertion block or a multiplexer.

In one embodiment, the method further comprises a final step ofmultiplexing the Base Layer bitstream

${\overset{\smile}{B}}_{BASE}\left( {k - 2} \right)$

, Enhancement Layer bitstream

${\overset{\smile}{B}}_{ENH}\left( {k - 2} \right)$

and mode indication into a single bitstream.

In one embodiment, said dominant direction estimation is dependent on adirectional power distribution of the energetically dominant HOAcomponents.

In one embodiment, in modifying the ambient HOA component, a fade in andfade out of coefficient sequences is performed if the HOA sequenceindices of the chosen HOA coefficient sequences vary between successiveframes.

In one embodiment, in modifying the ambient HOA component, a partialdecorrelation of the ambient HOA component C_(AMB)(k - 1) is performed.

In one embodiment, quantized direction comprised in the first tuple setsM_(DIR)(k) is a dominant direction.

FIG. 9 shows a flow-chart of a method for decompressing a compressed HOAsignal.

In this embodiment of the invention, the method 900 for decompressing acompressed HOA signal comprises perceptual decoding and source decodingand subsequent spatial HOA decoding to obtain output time frames

$\overset{\frown}{C}\left( {k - 1} \right)$

of HOA coefficient sequences, and the method comprises a step ofdetecting 901 a layered mode indication LMF_(D) indicating that thecompressed Higher Order Ambisonics (HOA) signal comprises a compressedbase layer bitstream

${\overset{\smile}{B}}_{BASE}(k)$

and a compressed enhancement layer bitstream

${\overset{\smile}{B}}_{ENH}(k)$

.

The perceptual decoding and source decoding comprises steps of:

-   demultiplexing 902 the compressed base layer bitstream-   ${\overset{\smile}{B}}_{BASE}(k)$-   , wherein first perceptually encoded transport signals-   ${\overset{\smile}{z}}_{i}(k),\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$-   and first encoded side information-   ${\overset{\smile}{\text{Γ}}}_{BASE}(k)$-   are obtained;-   demultiplexing 903 the compressed enhancement layer bitstream-   ${\overset{\smile}{B}}_{ENH}(k)$-   , wherein second perceptually encoded transport signals-   ${\overset{\smile}{z}}_{i}(k),\,\, i = O_{\text{MIN}} + 1,\,\,...,\,\, I$-   and second encoded side information-   ${\overset{\smile}{\text{Γ}}}_{ENH}(k)$-   are obtained;-   perceptually decoding 904 the perceptually encoded transport signals-   ${\overset{\smile}{z}}_{i}(k),\,\, i = 1,\,\,...,\,\, I$-   , wherein perceptually decoded transport signals-   ${\overset{\frown}{z}}_{i}(k)$-   are obtained, and wherein in a Base Layer Perceptual Decoder 540    said first perceptually encoded transport signals-   ${\overset{\smile}{z}}_{i}(k),\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$-   of the base layer are decoded and first perceptually decoded    transport signals-   ${\overset{\frown}{z}}_{i}(k),\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$-   are obtained, and wherein in an Enhancement Layer Perceptual Decoder    550 said second perceptually encoded transport signals-   ${\overset{\smile}{z}}_{i}(k),\,\, i = O_{\text{MIN}} + 1,\,\,...,\, I$-   of the enhancement layer are decoded and second perceptually decoded    transport signals-   ${\overset{\frown}{z}}_{i}(k),\,\, i = O_{\text{MIN}} + 1,\,\,...,\,\, I$-   are obtained;-   decoding 905 the first encoded side information-   ${\overset{\smile}{\text{Γ}}}_{BASE}(k)$-   in a Base Layer Side Information Source Decoder 530, wherein first    exponents e_(i)(k), i = 1, ..., O _(MIN) and first exception flags    β_(i)(k), i = 1, ..., O _(MIN) are obtained; and-   decoding 906 the second encoded side information-   ${\overset{\smile}{\text{Γ}}}_{ENH}(k)$-   in an Enhancement Layer Side Information Source Decoder 560, wherein    second exponents e_(i)(k), i = O _(MIN) + 1, ...,I and second    exception flags β_(i)(k), i = O _(MIN) + 1, ..., I are obtained, and    wherein further data are obtained 907, the further data comprising a    first tuple set M_(DIR)(k + 1) for directional signals and a second    tuple set M_(VEC)(k + 1) for vector based signals, each tuple of the    first tuple set M_(DIR)(k + 1) comprising an index of a directional    signal and a respective quantized direction, and each tuple of the    second tuple set M_(VEC)(k + 1) comprising an index of a vector    based signal and a vector defining the directional distribution of    the vector based signal, and further wherein prediction parameters    ξ(k+1) 908 and an ambient assignment vector ν_(AMB,ASSIGN)(k) 909    are obtained. The ambient assignment vector ν_(AMB,ASSIGN)(k)    comprises components that indicate for each transmission channel if    and which coefficient sequence of the ambient HOA component it    contains.

The spatial HOA decoding comprises steps of:

-   performing 910 inverse gain control, wherein said first perceptually    decoded transport signals-   ${\overset{\frown}{z}}_{i}(k),\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$-   are transformed into first gain corrected signal frames-   ${\overset{\frown}{y}}_{i}(k),\,\, i = 1,\,\,...,\,\, O_{\text{MIN}}$-   , according to said first exponents e_(i)(k), i = 1, ..., O _(MIN)    and said first exception flags β_(i)(k), i = 1, ..., O _(MIN), and    wherein said second perceptually decoded transport signals-   ${\overset{\frown}{z}}_{i}(k),\,\, i = O_{MIN} + 1,\,\,...,\,\, I$-   are transformed into second gain corrected signal frames-   ${\overset{\frown}{y}}_{i}(k),\,\, i = O_{\text{MIN}} + 1,\,\,...,\,\, I$-   according to said second exponents e_(i)(k), i = O _(MIN) + 1, ...,    I and said second exception flags (β_(i)(k), i = O _(MIN) + 1, ...,    I;-   redistributing 911 in a Channel Reassignment block 605 the first and    second gain corrected signal frames-   ${\overset{\frown}{y}}_{i}(k),\,\, i = 1,\,\,...,\,\, I$-   to l channels, wherein frames of predominant sound signals-   ${\overset{\frown}{X}}_{PS}(k)$-   are reconstructed, the predominant sound signals comprising    directional signals and vector based signals, and wherein a modified    ambient HOA component-   C̃_(I, AMB)(k)-   is obtained, and wherein the assigning is made according to said    ambient assignment vector ν_(AMB,ASSIGN)(k) and to information in    said first and second tuple sets M_(DIR)(k + 1), M_(VEC)(k + 1);-   generating 911b in the Channel Reassignment block 605 a first set of    indices-   J_(AMB,ACT)(k)-   of coefficient sequences of the modified ambient HOA component that    are active in the k^(th) frame, and a second set of indices-   J_(E)(k − 1),  J_(D)(k − 1),  J_(U)(k − 1)-   of coefficient sequences of the modified ambient HOA component that    have to be enabled, disabled and to remain active in the (k-1)^(th)    frame;-   synthesizing 912 in the Predominant Sound Synthesis block 606 a HOA    representation of the predominant HOA sound components-   ${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$-   from said predominant sound signals-   ${\overset{\frown}{X}}_{PS}(k)$-   , wherein the first and second tuple sets M_(DIR)(k + 1),    M_(VEC)(k + 1)), the prediction parameters ξ(k+1) and the second set    of indices-   J_(E)(k − 1),  J_(D)(k − 1),  J_(U)(k − 1)-   are used;-   synthesizing 913 in the Ambient Synthesis block 607 an ambient HOA    component-   ${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$-   from the modified ambient HOA component-   C̃_(I, AMB)(k)-   , wherein an inverse spatial transform for the first O_(MIN)    channels is made and wherein the first set of indices-   J_(AMB, ACT)(k)-   is used, the first set of indices being indices of coefficient    sequences of the ambient HOA component that are active in the k^(th)    frame, wherein the ambient HOA component has one of at least two    different configurations, depending on the layered mode indication    LMF_(D); and-   adding 914 the HOA representation of the predominant HOA sound    components-   ${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$-   and the ambient HOA component-   ${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$-   in a HOA Composition block 608, wherein coefficients of the HOA    representation of the predominant sound signals and corresponding    coefficients of the ambient HOA component are added, and wherein the    decompressed HOA signal-   $\overset{\frown}{C}\left( {k - 1} \right)$-   is obtained, and wherein the following conditions apply:-   if the layered mode indication LMF_(D) indicates a layered mode with    at least two layers, only the highest l-O_(MIN) coefficient channels    are obtained by addition of the predominant HOA sound components-   ${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$-   and the ambient HOA component-   ${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$-   , and the lowest O_(MIN) coefficient channels of the decompressed    HOA signal-   $\overset{\frown}{C}\left( {k - 1} \right)$-   are copied from the ambient HOA component-   ${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$-   . Otherwise, if the layered mode indication LMF_(D) indicates a    single-layer mode, all coefficient channels of the decompressed HOA    signal-   $\overset{\frown}{C}\left( {k - 1} \right)$-   are obtained by addition of the predominant HOA sound components-   ${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$-   and the ambient HOA component-   ${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$-   .

The configuration of the ambient HOA component in dependence of thelayered mode indication LMF_(D) is as follows:

If the layered mode indication LMF_(D) indicates a layered mode with atleast two layers, the ambient HOA component comprises in its O_(MIN)lowest positions HOA coefficient sequences of the decompressed HOAsignal

$\overset{\frown}{C}\left( {k - 1} \right)$

, and in remaining higher positions coefficient sequences being part ofan HOA representation of a residual between the decompressed HOA signal

$\overset{\frown}{C}\left( {k - 1} \right)$

and the HOA representation of the predominant HOA sound components

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$

.

On the other hand, if the layered mode indication LMF_(D) indicates asingle-layer mode, the ambient HOA component is a residual between thedecompressed HOA signal

$\overset{\frown}{C}\left( {k - 1} \right)$

and the HOA representation of the predominant HOA sound components

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$

.

In one embodiment, the compressed HOA signal representation is in amultiplexed bitstream, and the method for decompressing the compressedHOA signal further comprises an initial step of demultiplexing thecompressed HOA signal representation, wherein said compressed base layerbitstream

${\overset{\smile}{B}}_{BASE}(k)$

, said compressed enhancement layer bitstream

${\overset{\smile}{B}}_{ENH}(k)$

and said layered mode indication LMF_(D) are obtained.

FIG. 10 shows details of parts of an architecture of a spatial HOAdecoding portion of a HOA decompressor according to one embodiment ofthe invention.

Advantageously, it is possible to decode only the BL, e.g. if no EL isreceived or if the BL quality is sufficient. For this case, signals ofthe EL can be set to zero at the decoder. Then, the redistributing 911the first and second gain corrected signal frames

${\overset{\frown}{y}}_{i}(k),\,\, i = 1,\,\,...,\,\, I$

to l channels in the Channel Reassignment block 605 is very simple,since the frames of predominant sound signals

${\overset{\frown}{X}}_{PS}(k)$

are empty. The second set of indices

J_(E)(k − 1),  J_(D)(k − 1),  J_(U)(k − 1)

of coefficient sequences of the modified ambient HOA component that haveto be enabled, disabled and to remain active in the (k-1)^(th) frame areset to zero. The synthesizing 912 the HOA representation of thepredominant HOA sound components

${\overset{\frown}{C}}_{PS}\left( {k - 1} \right)$

from the predominant sound signals

${\overset{\frown}{X}}_{PS}(k)$

in the Predominant Sound Synthesis block 606 can therefore be skipped,and the synthesizing 913 an ambient HOA component

${\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)$

from the modified ambient HOA component

C̃_(I, AMB)(k)

in the Ambient Synthesis block 607 corresponds to a conventional HOAsynthesis.

The original (ie. monolithic, non-scalable, non-layered) mode for theHOA compression may still be useful for applications where a low qualitybase layer bit stream is not required, e.g. for file based compression.A major advantage of perceptually coding the spatially transformed firstO _(MIN) coefficient sequences of the ambient HOA component C_(AMB),which is a difference between the original and the directional HOArepresentation, instead of the spatially transformed coefficientsequences of the original HOA component C, is that in the former casethe cross correlations between all signals to be perceptually coded arereduced. Any cross correlations between the signals z_(i), i = 1, ..., Imay cause a constructive superposition of the perceptual coding noiseduring the spatial decoding process, while at the same time thenoise-free HOA coefficient sequences are canceled at superposition. Thisphenomenon is known as perceptual noise unmasking.

In the layered mode, there are high cross correlations between each ofthe signals z_(i), i = 1, ..., O _(MIN) and also between the signalsz_(i), i = 1, ..., O _(MIN) and z_(i), i = O _(MIN) + 1, ..., I, becausethe modified coefficient sequences of the ambient HOA component

c̃_(AMB,n) , n  = 1,  ...,  O_(MIN)

include signals of the directional HOA component (see eq.(3)). To thecontrary, this is not the case for the original, non-layered mode. Itcan therefore be concluded that the transmission robustness introducedby the layered mode may come at the expense of compression quality.However, the reduction in compression quality is low compared to theincrease in transmission robustness. As has been shown above, theproposed layered mode is advantageous in at least the situationsdescribed above.

While there has been shown, described, and pointed out fundamental novelfeatures of the present invention as applied to preferred embodimentsthereof, it will be understood that various omissions and substitutionsand changes in the apparatus and method described, in the form anddetails of the devices disclosed, and in their operation, may be made bythose skilled in the art without departing from the spirit of thepresent invention.. It is expressly intended that all combinations ofthose elements that perform substantially the same function insubstantially the same way to achieve the same results are within thescope of the invention. Substitutions of elements from one describedembodiment to another are also fully intended and contemplated.

It will be understood that the present invention has been describedpurely by way of example, and modifications of detail can be madewithout departing from the scope of the invention.

Each feature disclosed in the description and (where appropriate) theclaims and drawings may be provided independently or in any appropriatecombination. Features may, where appropriate be implemented in hardware,software, or a combination of the two. Connections may, whereapplicable, be implemented as wireless connections or wired, notnecessarily direct or dedicated, connections.

Reference numerals appearing in the claims are by way of illustrationonly and shall have no limiting effect on the scope of the claims.

Cited References

-   [1] EP12306569.0-   [2] EP12305537.8 (published as EP2665208A)-   [3] EP133005558.2-   [4] ISO/IEC JTC1/SC29/WG11 N14264. Working draft 1-HOA text of    MPEG-H 3D audio, January 2014

1. A method of decoding a compressed Higher Order Ambisonics (HOA)representation of a sound or soundfield, the method comprising:receiving a bit stream containing the compressed HOA representation;determining that the bitstream comprises only a single layer; anddecoding the compressed HOA representation from the layered bitstream toobtain a sequence of decoded HOA representations. wherein, for a framek, the single layer is decoded based on an addition of a correspondingpredominant HOA sound component$\left( {{\overset{\frown}{C}}_{PS}\left( {k - 1} \right)} \right)$ anda corresponding ambient HOA component$\left( {{\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)} \right)$.
 2. (canceled)
 3. An apparatus for decoding a compressed Higher OrderAmbisonics (HOA) representation of a sound or a soundfield, theapparatus comprising: a receiver for receiving a bit stream containingthe compressed HOA representation; a processor for determining that thelayered bitstream comprises only a single layer; and a decoder fordecoding the compressed HOA representation from the layered bitstream toobtain a sequence of decoded HOA representations. wherein, for a framek, the single layer is decoded based on an addition of a correspondingpredominant HOA sound component$\left( {{\overset{\frown}{C}}_{PS}\left( {k - 1} \right)} \right)$ anda corresponding ambient HOA component$\left( {{\overset{\frown}{\widetilde{C}}}_{AMB}\left( {k - 1} \right)} \right)$.
 4. (canceled)
 5. A non-transitory computer readable storage mediumcontaining instructions that when executed by a processor perform amethod according to claim 1.